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Abstract 

Unveiling a fundamental link between information theory and estimation theory, 
the I-MMSE relation by Guo, Shamai and Verdu [14] . together with its numerous 
extensions, has great theoretical significance and various practical applications. On the 
other hand, its influences to date have been restricted to channels without feedback 
or memory, due to the absence of its extensions to such channels. In this paper, 
we propose extensions of the I-MMSE relation to discrete-time and continuous-time 
Gaussian channels with feedback and/or memory. Our approach is based on a very 
simple observation, which can be applied to other scenarios, such as a simple and direct 
proof of the classical de Bruijn’s identity. 


1 Introduction 

Consider the following discrete-time memoryless Gaussian channel 

Y = + Z, (1) 

where snr denotes the signal-to-noise ratio of the channel, X and Y denote the input and 
output of the channel, respectively, and the standard normally distributed noise Z is inde¬ 
pendent of X. An interesting recent result by Guo, Shamai and Verdu m states that for 
any channel input X with -EfX 2 ] < oo, 

Z-I(X-,Y) = ±E{(X-E{X\Y})% (2) 

where the left hand side is the derivative of /(V; Y) with respect to snr , and the right-hand 
side is half of the so-called minimum mean-square error (MMSE), which corresponds to the 
best estimation of X given the observation Y. The I-MMSE relation (J2J) carries over verbatim 
to linear vector Gaussian channels and has been widely extended to continuous-time Gaussian 
channels [14] , abstract Gaussian channels m, additive channels [45] , arbitrary channels [31] , 
derivatives with respect to arbitrary parameterizations [2D], higher order derivatives [ 22 ], and 
so on. 

Unveiling an important link between information theory and estimation theory, the I- 
MMSE relation as above and its numerous extensions are of fundamental significance to 
relevant areas in these two fields and have been exerting far-reaching influences over a wide- 
range of topics. Representative applications include, but not limited to, power allocation 
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of parallel Gaussian channels [26], analysis of extrinsic information of code ensembles [34]. 
Gaussian broadcast channels H3, Gaussian wiretap channels urn Gaussian interference 
channels [4], interference alignment [06], a simple proof of the classical entropy power in¬ 
equality [02]]. For a comprehensive reference to the applications of the I-MMSE relation and 
its extensions, we refer to [37] . 

On the other hand, all the applications of the I-MMSE relation to date have been re¬ 
stricted to channels without feedback or memory, due to the lack of extensions of the I- 
MMSE relation to such channels. In this regard, a “plain” generalization of the original 
I-MMSE relation to feedback channels should not be expected, which has been noted in [10], 
where an example is given to show that the exact I-MMSE relation fails to hold for some 
continuous-time feedback channel. In this paper, we remedy the situations with some ex¬ 
plicit correctional terms (which vanish if the channel does not have feedback or memory) and 
extend the I-MMSE relation to channels with feedback or memory. Despite the fact that the 
I-MMSE relation have been examined from a number of perspectives (see its multiple proofs 
in IH). our approach is still novel and powerful. As a matter of fact, other than recovering 
and extending the I-MMSE relation, our approach can be applied elsewhere, such as yielding 
a simple and direct proof of the classical de Bruijn’s identity [38, 5]; see Section [2721 

Our approach is based on a surprisingly simple idea, which can be roughly stated as 
follows: before taking derivative of an information-theoretic quantity with respect to certain 
parameters, we represent it as an expectation with respect to a probability space independent 
of the parameters. For illustrative purpose, in what follows, we consider the discrete-time 
Gaussian channel in ([I]) and review a “conventional” proof of ([2]) in |T4] and compare it with 
ours. 

First, note that for the channel in ([[]), taking derivative of I(X ; Y) is equivalent to that 
of H(Y), which can be written as the expectation of — log fy(Y): 

H(Y) = -E[log f Y (Y)]. 

In their fourth proof of ([2]), the authors of [14] choose the probability space, with respect to 
which the expectation as above is taken, to be the sample space of Y (with naturally induced 
measure), which obviously depends on snr. With respect to this probability space, H(Y ) is 
naturally expressed as: 

H{Y) = - [ fr (y) log f Y (y)dy. 

Jr 

Then, under some mild assumptions, the derivative of H(Y) with respect to snr can pene¬ 
trate into the integral, and then (|2]) follows from integration by parts and other straightfor¬ 
ward computations. 

Under our approach, we would rather choose a probability space independent of snr. For 
example, choosing the probability space to be the sample space of (X,Z), we will express 
H(Y) as 

H(Y) = - f f f x {x)fz{z)logf Y {Vsnrx + z)dxdz. 

J M J M 

It turns out such a seemingly innocent shift of viewpoint will render the follow-up compu¬ 
tations rather simple and direct before reaching ([2]); and most importantly, when applied to 
channels with feedback or memory, it naturally leads to extensions of the I-MMSE relation. 
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For instance, consider the discrete-time Gaussian channel with feedback: 


Yi = y/srvrXi(M, Y[ x ) + Z h i = l,2,...,n, 


where the channel input X. t depends on the message M and the previous channel outputs 
17 - 1 . Using the above-mentioned approach, we will obtain the following extension (see 
Remark 13.41) of the I-MMSE relation: 


, -i n n 

I(X? Y?) = - Y, E [(A - E[Xi\Y?}) 2 ] + snr ^ E 


dsnr 


2=1 


2=1 


(X, - E^ivri) 


. (3) 


where X{ is the abbreviated form of X /(M, F7 _1 ) and /( X™ —>■ Y™) is the directed information 
between X™ and Y™. Directed information is a notion generalized from mutual information 
for feedback channels, and the second term in the right hand side of (J3]) is a correctional 
term, which vanishes when X % does not depend on Y, ? '~ 1 (he., there is no feedback), so (JHJ) 
is indeed an extension of the I-MMSE relation in (J2]) to discrete-time Gaussian channels 
with feedback. As elaborated later, the I-MMSE relation can also be extended to Gaussian 
channels, in either discrete-time or continuous-time, with feedback and/or memory. 

The remainder of the paper is organized as follows. In Section [2l based on the proposed 
approach, we give a new proof of the I-MMSE relation for discrete-time Gaussian channels, 
and a new proof of the classical de Bruijn’s identity. We will present our extensions of the 
I-MMSE relation, the main results in this paper, in Sections |3] and U which will be followed 
by an outlook for some promising future directions in Section El 


2 New Proofs of Existing Results 

In this section, to further illustrate the idea of our approach, we give new proofs of some 
existing results: the original I-MMSE relation in ()2]) and the classical de Bruijn’s identity. 
To enhance the readability and emphasize the main idea, here and throughout the paper, 
we omit some technical details of checking the conditions required for the interchanges of 
differentiation and integration, which will be provided in the Appendices. 

2.1 A new proof of the I-MMSE relation 

In this section, we consider the Gaussian channel specified in ([TJ) and give a new proof of (EJ . 
Here and throughout the paper, we replace yjsnr with p to avoid notational cumbersomeness 
during the computation; the derivative with respect to snr can be readily obtained with an 
application of the chain rule. Then, under the new notation, the channel (JT]) becomes 

Y = pX + Z, 

where p 6 M + , and we only have to prove that 

X(X-,Y)=pEl(X -E[X\Y)n (4) 
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Obviously, the conditional density of Y given X = x by fY\x(y\x) = ^ px ' >2 ^ 2 , and 

the density function of Y can be computed as 

fviy) = [ f Y \x(y\x)fx(x)dx. 


It follows from the assumption that the channel is memoryless that 

I(X- Y) = H(Y) - H(Y\X) = H(Y) - H{Z), 
which, together with the fact that Z does not depend on p, implies that 

1 d 


1") = “E[log MY)} = -E 

Now, some straightforward computations yield 

4-fy(X) = j~[ fY\x{Y\x)f x {x)dx 
dp dp 


Jy(Y) dp 


fv(Y) 


= - (pX + Z - px)(X - x)f Y \x{Y\x)f x (x)dx 

Jr 

= ~fy(Y) [ (pX + Z - px)(X - x)f x \ Y (x\Y)dx. 


It then follows that 


d 


-/(X;n = E 


(Y - px)(X - x)f X \Y(x\Y)dx 

> 

= E [YX - YE[X\Y] - pXE[X\Y] + pE[X 2 \Y ]] 

= E\YX] - E[FX] - E[pE 2 [X\Y}] + E[pE[X 2 \Y}} 
= pE[X 2 -E?[X\Y}\ 

= P E[(X-E[X|F]) 2 ], 


as desired. 

2.2 A new proof of de Bruijn’s identity. 

The following de Bruijn’s identity is a fundamental relationship between the differential 
entropy and the Fisher information. Based on the proposed approach, we will give a new 
proof of this classical result. 

Theorem 2.1. Let X be any random variable with a finite variance and let Z be an inde¬ 
pendent standard normally distributed random variable. Then, for any t > 0, 

j t H(X + VtZ) = ^J{X + ViZ), (5) 

where J(-) is the Fisher information. 
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Proof. First of all, define 


Y = X + VtZ, 

whose density function can be computed as 

fy(y) = [ fx(x)f Y \x(y\x)dx = I ^=e~ iy ~ x)2/m dx. 
Jr Jr v Jxt 

Immediately, we have 

f y (Y) = f Y (X + VtZ) = [ ^^e~ {x+Vlz - x)2/{2t) dx. 

Jr v 27 it 


Now, taking the derivative with respect to t, we obtain 

(X — x) (X + VtZ — x) 1 

2 1 


—f y (Y)= [ /-y( x ) c -(x+Vtz-x) 2 /(2t) 

dt Jr VZnt 


(X - x)(X + VtZ - x) 1 


2 1 2 


2 1 


2 1 2 


f Y \x(Y\x)f x (x)dx 


dx 


= fr(Y) 


(X - x)(Y - x) 1 . , 

— + — ) fx\ Y {x\Y)dx. 


2 1 2 


It then follows that 
d 


d. 


i H(Y)=- li mo g f Y (Y) ] = -K 


d 


Uy(Y) dt 


fr(Y) 


= E 


(X-x)(Y-x) 1 , |vu 

— + Yt ) fx\Y[x\Y)dx 


2t 2 


E[-XY + (X + Y')E[X|F] -E[X 2 |F]] | 1 
2f2 + 2 t 

-E[X 2 ]+E[E 2 [X|y]] 1 

+ 2 i' 


2 1 2 

On the other hand, similarly as above, 

= f fx(x) {Y _ x) 2 /m x-Y ^_ 

V2Vt 


f Y (Y) = 


dx = f Y (Y) 


x-Y 


t Jr t 

It then follows that the right hand side of ([5]) can be computed as 


fx\ Y (x\Y)dx, 


J(Y) = E 


(MM V 

\MY)J 


_ E[E 2 [X|y] + Y 2 - 2E[X|F]F] 

“ t 2 

_ E[E 2 [X|y]] + E[y 2 ] - 2E[XF] 

“ T 2 ’ 

which, by the fact that t = E[(X — Y) 2 ], is equal to the left hand side of 
theorem then immediately follows. 

Remark 2.2. The new proof of de Bruijn’s identity actually further reveals that 

f t H(x + Viz) = + Viz ) = J E \(Y - E[x|y]) 2 ]. 


( 6 ) 


The 

□ 
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3 The Extended I-MMSE Relation in Discrete Time 


In this section, using the ideas and techniques illustrated in Section [2j we give extensions of 
the I-MMSE relation (J2J) to channels with feedback or memory. 

We start with the following general theorem on a discrete-time system: 

Theorem 3.1. Consider the following discrete-time system 

Yi = pgi(Wl, Y^- 1 ) + Zi, i = l,...,n, (7) 


where p e M+, all Wi are independent of all Zi, which are i.i.d. standard normal random 
variables and each gi(-, •) is a continuous function differentiable in its second parameter. 
Assume that for any i and any compact subset K C M +J 


E 


sup 9i 

,p£K 


(wiy; 


i—1\ 


< oo, E 


sup 

p&K 


d_ 

dp 


9iiWl,Yl 


2—T 


< OO. 


( 8 ) 


Then we have 


d_ 

dp 


17 ) 


n n 

pE E K»- E i9‘i y i"]) 2 ]+p 2 E E 

2=1 2 = 1 


(ft - E[ft|U"D 



(9) 


where we have written g t (Wf Yf x ) simply as gi. 

Proof. Note that 

n 

I (IU”; Yf) = H(Y-n -Y^HiYfW^Y^) = H(Y?) - nH{Z x ), 

2=1 


which immediately implies 


d 

dp 


I(Wf] 17) = -E 


d_ 

dp 


log/yn(F 7 


= -E 


|>pOT) dp 


fMYr 


In the remainder of the proof, we will omit the subscripts of the density functions. For 
instance, f(yl) means the density function of 17 , /( 17 ) means the density function of 17 
evaluated at Yf 1 , f{y 1 f\w 1 f) means the conditional density function of 17 given Wf = w™. 
Under the system assumptions, we have 


2=1 


( /r )—\ n ~ P9i{w{,y[ 1 )) 2 /2}, 

(V2vr) n i=1 


and furthermore, 




~[exp {-(Yi-pgi(wi,Yf 1 )) 2 / 2 } 


(72^) n dp f = \ 

1 d n 

- m(w\,Y;-') + Z.) 2 /2} 

(V2vr) n ap 77 

n , 

-/(U”W) - PftW. P" 1 )) adwi y;- 1 ) - g,(w [, y;- 1 ) 


2=1 


d 


+ PfTp^WLYt 1 ) -giiwiYt 1 )) 
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It then follows that 


d_ 

dp 



-f [ /P?M)/M)<K 

dp Jm n 
Jr™ dp 


-1 f> - P gM,Y;- 1 )) (W;. if 1 ) - aw, if 1 ) 

Jr« i=1 V 

+p^(9.(» r ;, if 1 ) - aft if 1 ))) /(i?K)/w)d»? 

= -/(if / E(d - «H if') (W;, if 1 ) - 9i(s, ir 1 ) 

® ?1 ; — -| V 


+ 9^-(9.(M'i‘,ir 1 ) 

dp 



Writing F/ 1 ), gi(w\, F/ x ) as gt, respectively, and using the fact that for any mea¬ 

surable function <p, 



^K,F-)/K|F-)d< 




we further compute 


j ,> 

f/or) = -/m E 

dp “ Jr- 


(Fi - + p-^9i) - (di+p^i )) /(^riiT)d^r 


2 — 1 


d 


-AY?) ]T ( (9i + p-g,)E |(Y, - p 9i )| If - E 


(9i + p-j-9i)(Yi-p9i) 
dp 


y n 


Similarly continue as in the proof of (SJ), we eventually obtain 


d 


d 


(9i + 9 5 -9i)«-9E[9 i |ir]) 


d n w r; u”) = E ( E 

n n 

= pE E K®- E (®i y ")) 2 ] + ' ,2 E E 


E 


2—1 


2 — 1 




d 




as desired. 


□ 


Remark 3.2. Theorem 13.11 still holds if each g* is a Lebesgue measurable function (again, 
differentiable in its second parameter) instead, which, however, is less relevant to practical 
engineering applications. 


Remark 3.3. It can readily checked that 


E 


(g, - E(9,l if) Eg, 


= E 


(^-E(^|F”)) 


( d 


' d 


M 

\dp 9i 

-E 

~r g i 

dp 

Y n 

) 
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which means that (JHD can be rewritten in the following more symmetric form: 


d 

dp 


I(W i* ; y/ 1 ) 


n n 

p£E[( 9 i -E( 9 «")f]V£E 

1=1 1=1 







Remark 3.4. Consider the discrete-time system as in (J7J). Rewriting all W % as M and each 
gi as Xi, we then have the following discrete-time Gaussian channel with feedback: 

Yi = y r srvrX i (M,Y^ 1 ) + Z i: i = l,2,...,n 


where M is interpreted as the message be transmitted and X ,, Y t are the channel inputs, 
outputs, respectively. It is well known that for such a feedback channel, 


i(xi -> y”) = /(M; y; 1 ), 


where I(X™ — y Y( l ) is the directed information between Xf and Yf. Then, applying Theo¬ 
rem [3H] and the chain rule for taking derivative, we have 


d 

dsnr 


i(x? ->• y”) 


-^E[(X.-E[.Y.|y”]) 2 ] + 


snr 


2—1 


(. xi - E^iy/ 1 ]) 


d 

dsnr 



? 


( 10 ) 

where Xi = Xi(M,Yf x ). This yields an extension of the I-MMSE relation to discrete-time 
Gaussian channels with feedback. 


Remark 3.5. Alternatively, rewriting each W t as X tl we will have the following discrete¬ 
time Gaussian channel with input and output memory (it is observed that such a channel is 
suitable for modeling some storage systems, such as flash memories DP): 

Yi = y/srvrgi(X \, y/' -1 ) + Z h % = 1, 2,..., n 


where Qi is interpreted as “part” of the channel and Xi, Yi are the channel inputs, outputs, 
respectively. Then, by Theorem 13.11 and the chain rule, we obtain 


, 1 n n 

' /(W; YD = x £ E [ft - E[ft|y”]) 2 ] + snr £ E 


dsnr 


2—1 


2=1 


(ft - E(ft|y”]) X-g 


( 11 ) 


where g % = g l (X\, Y( 1 ). This yields an extension of the I-MMSE relation to discrete-time 
Gaussian channels with input and output memory. 


4 The Extended I-MMSE Relation in Continuous Time 

As elaborated in the following theorem, the continuous-time I-MMSE relation, the continuous¬ 
time analog of (|2]h has been established in [H] . 

Theorem 4.1 (Theorem 6 of [H]). Consider the following continuous-time Gaussian channel 

Y(t) = y/ snr f X(s)ds + B(t), t E [0, T ], 

Jo 
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where (A"(s)} is the channel input satisfying the power constraint 

E[X 2 (s)]ds < oo (12) 

and {B(t)} is the standard Brownian motion. Then, we have 

Y 0 T ) = 1 £ E[(X(s) - E[X(s)|K ( j r ]) 2 ]ds. (13) 

In this section, using the ideas and techniques illustrated in Section [2] we give extensions of 
the continuous-time I-MMSE relation to channels with feedback or memory. 

We start with a general theorem on a continuous-time system: 

Theorem 4.2. Consider a continuous-time system characterized by the following stochastic 
differential equation: 

Y(t) — p f g(s,WS,Y*)ds + B(t), t E [0,T], (14) 

Jo 

where p G R+, the continuous random process {W(t)} is independent of the standard Brow¬ 
nian motion {B(t)}, and g{ ■, •) is a deterministic function. Assume that 

(a) p(s, 7 q,0o) defined for all 7 (■),</>(■) 6C[0,T] , the set of all continuous functions over 
[0, T\, and is itself a continuous function in s, s G [0, T); 

(b) the solution {F(t)} to the stochastic differential equation 0 uniquely exists; 

(c) for any s G [0,T], g(s, Wq , Yf) is continuously differentiable with respect to p with 
probability 1 ; 



(d) for any compact subset K C M+, we have 


[ T E 

sup g 2 (s, Wq, Yf) 

ds < 00 , 

[ E 

sup (fg(s,WS,Y‘)) 

Jo 

.P&K 


Jo 

p&k \dp J 


ds < 00 ; 


(e) g(s, 7 o,</>o) is uniformly bounded over all s E [0,T] and all j(-), tf(-) G (7[0, T]. 

Then, we have 

Al(WPY 0 T )=p£E{(g(s)-E{g(s)\YZ]) 2 ]ds+p 2 J\ 

where we have written g(s, Wq, T 0 S ) simply as g(s). 

Strictly speaking, Theorem 14.21 is not a generalization of Theorem 14.11 Condition (e) is 
stronger than the square integrability condition (|I2|) . as one can easily End g satisfying the 
latter but not the former. As elaborated in the following theorem, at the expense of an extra 
yet mild condition (see (f) in the following theorem), Condition (e) can be relaxed to an 
integrability condition (see (g)). 


(g(s) -E[g(s)\Y^]) j^g{s) 


ds, 
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Theorem 4.3. Consider the continuous-time system a satisfying Conditions (a), (b), 
(c), (d) and the following conditions: 

(f) f or an U a > 0 an d any t e [0, T], 

rt 


P 


g 2 (s,W°,Y 0 s )ds = a = 0; 


(g) with probability 1, we have (note that the third parameter in the following g function 
is Bq, rather than F 0 S ) 


g 2 (s,W(s),B(f)ds < oo. 


Then, we have 
d 


^/«;T 0 T )=p j E[(g(s)-E[g(s)\Y 0 T }) 2 }ds+p 2 


E 


d 


(sO) -E [p(s)|F 0 t ]) — g(s) 


ds, 

5) 

where we have written g(s,W((,Yf) simply as g(s). 

Remark 4.4. Similarly as in Remark 13.31 we can obtain the following more symmetric 
formula: 


n T ) = pJ o E[(9M-E[j W in T ]) 2 ]d S +,) : 


E 


(g(s) - E[j(s)|r 0 T ]) ^ Tg(s) - E 


fg(s) 

dp 


Y 1 

1 n 


Remark 4.5. Parallel to Remarks 13.41 the continuous-time system in (TT4l) can be interpreted 
as the following continuous-time Gaussian channel with feedback: 

Y{t) = [ X(s, M, Y 0 s )ds + B(t), t G [0, T\. 


An application of Theorem 14.21 then yields 


d 


I(M-Y^) = i / F.{(X(s)-E{X(s)\Y 0 T ]) 2 ]ds+snr I E (X(s) - E [X(s)| Y 0 T }) -£-X(s) 


dsnr 


dsnr 
(16) 

where A"(s) is the abbreviated form of X(s, M , T 0 S ). This gives an extension of the I-MMSE 
relation to continuous-time Gaussian channels with feedback. 

Parallel to Remark 13.51 it can be also interpreted as the following continuous-time Gaus¬ 
sian channel with input and output memory: 

Y(t) = \fsrvr f g(s, X°,Y 0 s )ds + B(t), t e [0,T\. 


An application of Theorem 14.21 then yields 


d 


dsm^Y'Y) =2 E ((9( s )-E|9(s)|r 0 T ]) ]ds+snr / E 


( 9 (s)-E[ 9 (s)|y 0 T ])A_ 9 (s) 


(17) 

where g(s) is the abbreviated form of g(s, X^Yf). This gives an extension of the I-MMSE 
relation to continuous-time Gaussian channels with input and output memory. 

Remark 4.6. It can be readily verified that Theorem l4.31 when interpreted as in the previous 
remark, includes Theorem 14.II as a special case; see more detailed explanations in Remark 14.81 


ds, 


ds. 


ds, 
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4.1 Properties of the solution to ( Il4l) 


In this section, we will give certain sufficient conditions that will guarantee the solution 
Y to flUJ) uniquely exists (Condition (b) in Theorem I4.2p . and moreover, g(s, Yf) is 
differentiable with respect to p (Condition (c) in Theorem 14.211 . More precisely, we have the 
following proposition. 


Proposition 4.7. Under the following conditions: 

• Dg(s, 7q, f>o), the Frechet derivative of g with respect to its third parameter </>(•), exists 
for any s G [0, T] and any y(-), </>(•) G C[0, T\; 


• (extended uniform Lipschitz conditions) There exists a constant C such that for all 
s G [0, T] and all y(-), </>(•), if(-) G C[0, T], we have 


l#( s ) 7o) 0o) — d{ s i 7o> 0o)l < 


and 


\\Dg{s, 7 q, ) - Dg(s, 7o,0o)ll < c Uo ~ ^o\ 


(extended linear growth conditions) There exists a constant C such that for all s G [0, T] 
and all y(-), </>(•) G C[0, T], we have 


9 (s, 7o> 0o) < C ( l + IIToIIoo + Il0o 


fS 11 2 \ 

loo/? 


and 


\\Dg{s, 7 q, 0o)ll 2 — C{1 + ||7ollL + H0ollL)> 


the solution Y to the continuous-time system [Tf\ ) uniquely exists, and moreover, with prob¬ 
ability 1, g(s, Wq, Yf) is differentiable with respect to p. 


Proof. We only sketch the proof, as it is essentially the standard argument for the existence 
and uniqueness of the solution to a stochastic differential equation with the well-known 
uniform Lipschitz and linear growth conditions; see, e.g., the proof of Theorem 2.2 in Chapter 

5 of [2BJ- 

Consider the following Picard’s iteration: 

V(o)(t) = 0, Y {n+1) (t)= f g(s,WS,Y^ )t0 )ds + B(t), t e [0,T\. 

Jo 

It can be easily verified that, for any n and any t G [0, T], Y( n ) (t) is differentiable with respect 
to p. Letting Z(„)(£) = f-\\ n )(t) for all n, we have 


Z(o)(t) = 0, Z(n+i)(t) — f g(s,WQ,Y^ 0 )ds + p f Dg(s,WQ,Yfo fi )(Z* n ^ 0 )ds, f G [0,T], 


Now, applying the standard argument for the existence and uniqueness of the solution to a 
stochastic differential equation, we deduce that there exists a stochastic process {Y(t),t G 
[0, T]} such that for any compact set K C M+, 


lim sup \Y n (t) — Y(t)\ = 0, a.s. 

n ^°° p£K, te[o,T] 
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and furthermore, there exists a stochastic process Z(t),t G [0,T] such that for any compact 
set K C M+, 

lim sup \Z n {t) — Z(t)\ — 0, a.s. 

n ^°° pGK,te[0,T\ 

It then follows that Y(t) is differentiable with respect to p, and Y(t ) = Z(t) with proba¬ 
bility 1, and consequently, g(.s, Wq, Yq s ) is differentiable with respect to p. □ 


4.2 Proof of Theorem 14.2 

Fix IT = w and let Y\ w ) be such that 

Y\ w )(t) = p[ g(s,w s 0l Y { s w)fi )ds + B{t), te[0,T}. 

Jo 

Then, by Theorem 7.1 of [24] (it can be checked that its assumptions are implied by Condition 
(e)), we observe that ~ Pb ~ Py, where means “equivalent”, and furthermore, 


d PY lw) \w 

dg,B 


( Y \l),o\ w o) = ex P 


P 


9(s,w s 0 ,Y^ l0 )dY lw) (s) 



It then follows from Lemma 4.10 in 


that 


(Yo \w'o ') = exp { p / g(s, w s 0 , Y 0 s )dY(s) - / g 2 (s, w s 0 , Y 0 s )ds } . 


dpy\w t 

dpB 

Note that, by dehnition, we have 
J(IT 0 t ;T 0 t )=E 
= E 


P_ 

2 


log ,,.**"7. . «Vo T ) 


log 


d(pw x Py) 

dp Y \W / r T\jjrT\ 


dpB 


-(Yo 1 W 


E 




E[g 2 (s)]ds - E 


dpY 

log ^ (y " ) 


Taking derivative with respect to p then yields 


d 


^I(W^ 1 Y 0 T )=p I E [g\s)]ds + ^ I E\g 2 (s)]ds - -fE 


p 2 d 


d 


dp 


2 dp 


= P ^[g 2 (s)]ds +p 2 


E 


g{s) T P 3(s \ 


dp 

ds ——E 
dp 


log p-(Y 0 T ) 

dpB 


log P^(Y?) 

a\±B 
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Writing g(s, Wp, F 0 S ) as g(s), we have 


d P djJiY / ,, ' 


d ( dny\ 


dp yd/i^ 0 / dp J dp B 


W (\rT\ T 


(To I w 0 )p w (dw) 


= / exp ^(s)rfF(s)-y / g 2 (s)ds } p w (dw) 


d 

dp, 
d 

dp, 



2 r T 




exp</? 2 / g(s)g(s)ds + p g(s)dB(s) - 


d 


P 


2 rT 


g 2 {s)ds > pw(dw) 


g(s)dY(s)+p \ —g(s)dY(s) + p g(s)(g(s) - g(s))ds 
Jo d P Jo 

+ p 2 [ g( s )-r(g(s) -g{s))ds\^^{dw,Y^) 


dPY (xr T 
dp B 


0 

(Yo) 


dp 



g{s)dY (s) + p 


dp B 

T d f T 

—g(s)dY(s) + p J g(s)(g{s) - g{s))ds 


+ P 2 J o 9 ( s )-^( 3( s ) ~ g( s )) ds )Pw\v(dw\Y^) 




g(s)dY(s) 


Y 1 

l n 


+ pE 


Wo 


d_ 

dp 


g(s)dY(s) 


Y t 

I o 


+ p / (E[g(s)\Y 0 T ]g(s) - E[g 2 (s)\Y 0 T ])ds 


+ P- 



d_ 

dp 


s(«)E(9(a)|i^-E 


sWj-sW 

dp 


Y t 

x o 



ds 


Note that by the properties of conditional expectation and Ito integral, we have 


E 


E 


g(s)dY(s) 


IJo 


Y 1 

1 n 


= E 


g(s)dY(s) 


ho 


= P E[p 2 (s)]ds, 


and similarly, 


and 


E 


E[g 2 (s)\Y 0 T ]ds 


L/o 


E [g 2 {s)]ds, 


pE 


E 


l T P 9{s)dY{s) 


Y 1 

1 n 


= P 


E 




ds = p 2 E 


E 


L«/o 


d_ 

' dp " 


g(s)dYg( s ) y; T 


ds 
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It then follows that 


d_ 

dp 


E 


log d -^(Y„ T ) 

dfiB 


= E 

= E 

= E 


d dp,y 

^ log 5^ (y “ ) 

d i djlY /\rT\ i / df^Y / \/- r T\ 




E 


IMo 


g(s)dY(s) 


d 


Y 1 

4 n 


+ pE 


d 

dp 


g{s)dY{s ) 


Y 1 

I n 


(E[g(s)\Y 0 1 ]g(s)-E[g 2 (s)\Y 0 1 })ds + p 2 / ( -g(s)E[g(s)\Y 0 1 ] - E 


d 


g( s )~rg ( s ) 


dp" 


Y 1 

1 n 


ds 


= f> nng(s)\Y„ T ]g(s)]ds + p 2 


E 


d 


E [s(s)|T 0 ]^( s 


ds. 


So we have 

d_ 
dp 


I(Wq ; Yq) — p f E[g 2 (s)]ds + p 2 


E 


g{s)^-g{s) 

dp 


ds 


-p E[E[g(s)\Y^]g(s)]ds - p 2 


E 


d 


msW^-gis 


dp" 


ds 


= P I E[(^(s) - E[g(s)\Y^}) 2 ]ds + p 


E 


d 


(g(s) -E[^(s)|y 0 })-j^g( s 


ds, 


as desired. 


4.3 Proof of Theorem 14.3 


The proof consists of the following 6 steps: 

Step 1. First of all, for any fixed W = w, by Theorem 7.7 of 1231, Py\w=w ~ Pb with 


dpy\w=i 

dpB 


'-(Bo ) = exp g(s, w s 0 , B s 0 )dB(s ) ~ \ /O, B s 0 )ds^j , 


where we have used Conditions (d) and (g) before invoking Theorem 7.7. Moreover, by 
Condition (d), it follows from Theorem 7.2 that p Y “C Pb with 


dpy . j-.y 


dpB 


(Bio ) = 


dpy\w=i 


dpB 


(Bo)dp w (w) 


= / exp 


g(s, w s 0 , B s 0 )dB(s) - g 2 (s, w s 0 , B$)ds \ dp w (w), 


which is obviously positive with probability 1. It then follows from Lemma 6.8 of [53] that 
Pb Py • So, in this step, we have shown that under the conditions specified in theorem, 
we have p Y ~ p Y \w=w ~ Pb- 

Step 2. For any n and 7 (•),</>(•) £ C[0,T], we follow [24] and define a truncated version of 
g as follows: 

g(n)(t, 7o,</>o) = g(^7o^o) 1 f 0 t 9 2 (s , 7S , ro)ds< n- 
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Now, define a truncated version of Y as follows: 


Y {n) (t)=p / g {n) (s,W*,Y 0 s )ds + B(t ), te[ 0,T], 


which, as elaborated on Page 265 in |2Tj, can be rewritten as 


Y (n)(t) =p g {n) (s, Wq, Y{ n)fi )ds + B{t), t e [0, T\. 
Jo 

It is well known that (see, e.g., Theorem 6.2.1 of IZU) that 


= 5 / E[af„)( s , wi, y ( *, i0 )] - E[E 2 [ 9(n) ( s , ws, 


and 


mi ; Y i ) = 2 / E[s (*. Wo. lo)] - E[E%( S , WS, y„‘)|K 0 *]]ds. 


Moreover, it follows from Theorem 14.21 (here, note that extra yet minor care has to be taken 
since g( n ){s, Wq, YY 0 ) is only a piecewise differentiable function in p; cf. Condition (c)) that 


l I(w » ;Y ^ ) = p l E [9w( s ’W„‘,r ( i),„)] - E [E 2 [s(„)(»,w' 0 , ,y ( *), 0 )|yS),o]]<io 

+ p 2 J e (s ( „)(o, ws,y ( * )i0 ) - Efa ( „)(s,,ws,y ( ’ ) ,„)|y ( ; ) ,„])T 9 (n) ( s ,ws,y, 

(18) 

Step 3. In this step, we will prove that 
d rT 


lim T mi ; Y(t),o) = p / E b (». wj, is)] - e[e-[ 9 (s, ws, y 0 ')|y 0 ]]<is 

n—>oo dp Jq 


+ P 2 / E 


d 


( 9 (s, W 0 S , F 0 *) - E[p(s, W 0 S , To*) |F 0 J ])-^(s, Wq, Fq 


ds. (19) 


Step 3.1. In this step, we observe that, with Condition (d), an application of the 
dominated convergence theorem will yield 

lim [ E{gf n) (s,WS,Y? n)fi )]ds= / E[j 2 (s, WS.VS)]*. 
n ~>°° J 0 Jo 

Step 3.2. In this step, we will prove that 


iim / E[rfo w (s,ws,y ( *„),oily;,,oil* = / eie-^.ws.isjk]]*. ( 20 ) 


ds. 
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First of all, we note that 


E[E% n) (s,W^Y^ 0 )\Y^ )fi }] 



= E 

= E 



9(n) ( s > Y (ji), o) 


d 9Y ( n)\W 

d/lB 


( Y (n),0\Wo)fl W (dw)/ 


d l l Y (n) 

dfiB 



9{n) (s, Wq, Bq 


dpY (n) \W 

d^B 


(Bo\wo)fi W (dw) x 


f d 9Y (n) 
V dji B 



-1 


We now proceed with the following steps: 

Step 3.2.1. In this step, we prove that in probability 


d 9Y (n) 

dfis 


(Bq) 



First of all, 
d 9Y (n) 

dg B 


( b q ) = j exp (^p g {n) (s,w s 0 ,B°)dB(s) - y g 2 n) (s,w s 0 , B°)ds) p w (dw). 


It then follows from the Ito isometry that 


exp ^ g {n) (s,w s 0 ,B s 0 )dB(s) - y ^ g 2 {n) {s,w s 0 , B s 0 )ds 



converges to 


ex P (p J o 9 (s, w o,Bo)dB(s) - y g 2 (s,w s 0 ,B s 0 )ds) 
in probability. And moreover, it can be easily checked that 


E 



o 2 r l 
9 / _2 


exp [p / g( n) (s,w s 0 ,B s 0 )dB(s ) - — / g {n) (s, w s 0 , B$)ds p w (dw ) 


= E 


dp Y( 


(n) ( nT 


dps 


{Bi 


= 1 


and 


E 


exp 



,2 ft 


pj 9 (s,w s 0 ,B^)dB(s) - y J g 2 (s,w s 0 ,B s 0 )dsjg w (dw ) 


= E 


dfly / T-\ r [ 1 \ 


dflB 


(Bo 


= 1. 


It then follows from Theorem 5.5.2 of HI] that 


lim E 

n—>-oo 




eXP ( P / 9(n){s,w s 0 ,B s 0 )dB(s) - Y 9(n)( s i w oi Bfyds'j 

pw(dw) = 0 , 


- exp ( p j g(s , Wq, B s 0 )dB(s ) - y g 2 (s, w s 0 , B s 0 )ds 
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which further implies that 


exp (p g(n)(s, Wqi Bq)cLB s - y ^ gf n) (s,w s 0 , B^ds'j p w (dw 


converges to 


exp 


2 rt 


Pj g(s,ui s 0 ,B^)dB s - y J g 2 (s,w s 0 ,B s 0 )dsjp w (dw ) 


in probability. 

Step 3.2.2. In this step, we will prove that in probability 


9(n)(s,w s 0 ,B s 0 


dpy (n) \w 


(,7)l " (Bo\wo)fiw{dw) ->• / g{s,Wo,B$^p^(B%\w%)[i W (dw). 


dps 


dp B 


First of all, it is easy to check that in probability 


dp Y(n) \w 


g {n )(s,w s 0 ,B s 0 ) (B%\w%) -> g{s,wZ,B$^^{BZ)w£). 


And moreover, we have 


E 

converges to 


dpB 


I dpY^W . t} T\„..T\ 


dp Y \W / J-,T\ T\ 

^ dps 


\g {n) (s,w s 0 ,B s 0 )\ —— (B 0 | w 0 )p w (dw) 


dps 


E[|^( S ,IF( S ),F 0 s )|]=E 


= E[|(/ (n) ( a ,W(a),Y|; )i0 )|] 


\g(s,Wo, B o)\^^{Bo\wo)p w {dw) 


dps 


So, similarly as in Step 3.1.1, we deduce that 
r dp Y(n) \ w dT 


g {n) (s,w s 0 ,B s 0 )- 


dp b 


(Bq\w) p w (dw) —>■ /(^|w)/i Ty (dw). 


in probability. 

Step 3.2.3. Note that Steps 3.2.1 and 3.2.2 collectively yield that 


g(n)( s ,Wo> B o) (Bo\wo)p W (dw) ) X 


d ^(u) f tjT' 


dpB 


-(B* 


-1 


converges to 


g(s,w s 0 ,B^) d ^ (Bq \wq) p w (dw) ) x 


7 dpY / r tn X 

d^ { o) 
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lim 

n—¥ oo 



in probability. Now, applying Jensen’s inequality, we have 

g(n)(s, w s 0 , B%) {Bo\wl)nw(dw)^ x 


d >- LY (r„) , nT . 

d/i B 


-l 


9(n) (s, B { 


s tdT\ d ^ Y Cn)\W . T , T\ , ( 7 ,\ / d L lY (n) , tjT' 


d/L b 


-(B I 0 \w I 0 )p w (dw)/—^(B 2 ) x 


2 dp y 


(») / d t 


(^0 


< 


dp B J dp B 

2 ( S t>T\ d ^ Y (n)\W , T, T\ ( j \ / d ^ Y (n) / r>T\\ d ^ Y (n) / D T\ 

d(n)( 5 > w o>£o) d ( B o\™ 0 )Vw(dw)/-^—(B 0 )) x -^—(5o) 



2 /„ ...3 dTn ^ d Tl..T 


S, Wq, i? 0 


dp B 


(B 0 \w 0 )g w (dw). 


Note that 
E 


j «(„)(».< = E[ 9 („)(s, WJ, %,„)] -4 E[g 2 ( s , W"y 0 ")] < oo, 

where the finiteness is due to Condition (d). Finally, the desired (l20j) follows from the 
generalized dominated convergence theorem (see, e.g., Theorem 19 on Page 89 of [35]). 

Step 3.3. In this step, we establish the following two convergences: 


lim 

n—yoo 


and 

r T 


E U„)(s.W 2 „',y'„ ) ,o)^9w(s.» , o',yi ) ,o)l * = 


E 


g(s,WS,Y 0 -)j-g(s,WS,Y 0 ‘ 


ds 


( 21 ) 


E 


'0 


E[S(„)(», W , o , ,y„),o)|i'w,o]^9(„)(s, WJ, y ( ’„ )0 


ds = 


E 


d 


E(9(s.H / „‘.y„‘)|yo']^9( 2 >.M , 0 *.y„ i 


( 22 ) 

Step 3.3.1. In this step, we will prove (JUD- Writing g {n) (s,W^Y^ )fi ), g(s,W^,Y 0 s ) as 
g(n)(s), g(s) for notational simplicity, we have 


ds. 


E 


'0 


9(n)(s) ^p9{n){s) 


ds 


E 


'0 


9 ( 9)T S ( S ) 


ds 


E 


d ( n)(s)^ ( n)(s) 


ds— f E 
do 


d 




ds+ [ E 


'0 


d 


30)^ ( n)0) 


ds— [ E 


'0 


9{s)^-g{s) 

dp 


ds 


= f E 
do 


d 


(d(n)0) ~9{S))— g{n)(s) 


ds 


E 


( ^9(»)W - ^9W 


ds. 


The desired convergences then follow from the fact that as n tends to infinity, 


E 


(d(n)(s) ~ 9(s))j^9(n)(s) 


ds) < E[(g {n) {s)-g(s)f]ds E 


tM s) 


ds —y 0 


and 


E 


g(s) | J^9(n)(s) ~ J- p di S ) 


ds ) < / E[g 2 (s)]ds / E 


d 

dp 


9W (s) - ^( s )) 


(is —^ 0. 
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Step 3.3.2. In this step, we will prove (122]) . To see this, note that 


E 


Jo 

= f\ 

Jo 


%«(». WS, y’n )S )K)Jj- p Sin)(0, Wo, 

Ebw(o, w;, r,•„),„)I 1 m,o]e £#„,(*, wf, r ( ',,„) 


K, 


(n),0 


whose convergence to 


E 


E[ 9 ( s ,iv',y„*)|y„ T ]E 


d 

dp 


s(». w 0 *, y 0 ‘) 


y 7 

T n 




can be established using a similar argument as in Step 3.2. 

Step 3.4. Note that Steps 3.1, 3.2 and 3.3 collectively yield (TT9|) . 
Step 4. In this step, we will prove 


lim W;yi,, 0 ) = /«;y 0 T ' 

n—yoo v n 


Obviously, it suffices to prove that 


ds, 


(23) 


lim 

71 —> OO 


El9(»,(».W?,y ( i )i0 )]<&= / n 9 2 (s,W$,YS)]ds, 


(24) 


and 


lim 

n—>-oo 


E[E 2 ( 9 (n) ( s ,H 7 >,y ( * w )|y ( * n)i 0 ]] ( j s = / E[E 2 [ 3 (s,iy 0 *,y 0 *)|y 0 *]]c(s. (25) 


Note that (T24l) has been established in Step 3.1, and the proof of (1251) can be established 
using a parallel argument as in Step 3.2. 

Step 5. In this step, we will establish the continuity of the following terms with respect to 
P- 


rT 


¥.\s 2 (s,WS,YS)}ds, 


E|E 2 [ 9 (s,w 0 *,y 0 *)|y 0 T ]]<(s, 


and 


E 


g(s,WS,Y„-)-^g(s,WS,Y 0 ’ 


ds, 


E 


E(9( S ,WS,Y„’)\Y?}-^g(s,W5,YJ 


ds. 


Note that the continuity of fj' E[g 2 (s, Wq, F 0 s )]ds immediately follows from the dominated 
convergence theorem together with Condition (d) and the fact that g(s, Wq, Tq s ) is continuous 
in p. And moreover, a parallel argument can be used to establish the continuity of 


E 


g(s,W^Y’)j-g(s,W’,YS) 


ds. 


To establish the continuity of J Q 7 E[E 2 [g(s, Wq , y o s )|y o T ]](is, it suffices to prove that for 
any sequence {p n } convergent to p, 


iim E[E 2 [g(s, w 0 *,y 0 (fc) ’*)|y„ l '’" ),T ]] = / e[e 2 [ 9 (s, w ;, y 0 ,ri, *)|y 0 ( '’ ) - T ]]<;s, 


19 




















which can be shown in a parallel argument as in Step 3.2, where the following similar 
convergence is proven: 


lim 

n —>-00 


E[E 2 [ 9( „,( S , ws, r ( ‘),„)|YS w ]]<& = / E[E 2 [ 9 ( S , ws, y 0 ')|y 0 T ]]<is. 


Furthermore, similarly as in Step 3.3.2, the continuity of 


E 


e[»(«, ws, r„*)|r 0 T ]^f 9 (s, ws, y c 


ds 


can be established as well. 

Step 6. It then follows from (TT8l) that, for any r > 0, 


Wi Y tt:° ) = Jf jWi V (nlO )*P 

e[9?„,(», w;, y&‘)] - E[E 2 [ 9(n) ( s , ws,y < ^)|y“i T \\dad P 


d 




= p 


p 



0 Jo 



E 


0 JO 


(«.)(*. • w s, yS)i y (»).'o' ))£*.>(*. w 'o. y,'Z 


S vWl ! \lvWl^^ ^ Tl/s \s(p): s ) 


where we have used the superscripts (p) and (r) to specify the underlying parameters. It 
then follows from the dominated convergence theorem that 

/«;r 0 w,T ) = f T p [ T E[ 9 2 (s, w;, r 0 w ’*)] - E[E 2 [ 9 (s, WS, Y^‘)\Y^' T ]]dsdp 


p 


E 


(g(s, WS, y 0 w '*) - E[ 9 (s, , WS, Y^'-)\Y^])^g( S , WS, Y 0 M ’’) 


dsdp. (26) 


Note that Step (5) has established the continuity of the following terms in p, 
/ 1 E[ 9 2 (s, WS^’ 1 )] - E[E 2 [ 9 (a, WS,Y^' a )\Y^]]ds 


dsdp, 


and 


E 


(s(», WS, Y^-‘) - E[ 9 (s, . WS, y 0 (P) ' S )l^),o])^ 9 (s. WS, Y^‘) 


ds. 


So, the desired formula (fT5l) then follows from taking the derivative of (1261) with respect to 
r, and the proof of the theorem is then complete. 

Remark 4.8. Theorem 14.11 is indeed included by Theorem 14.31 as a special case. More pre¬ 
cisely, the power constraint (fT2j) trivially implies Conditions (b), (c) and (d). Note that 
Theorem 14.31 still holds true if Condition (f) is replaced by the following somewhat cumber¬ 
some condition: for any n, 


d 

dp~ 


d 


9(n) (®j VF(s), F) n yo) ( dpP^i^V {$) 0 ) j lf°g 2 (t,W(t),Y*)dt<n, a.S., 
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which is also implied by (TT2|) . So, Theorem 14.31 recovers Theorem 14.11 with a direct and 
rigorous proof 0. 

Remark 4.9. To show fl20]h as opposed to our approach in Step 3.2, a possible and seem¬ 
ingly more natural first step is to establish the convergence of E 2 [^( n )(s, Wq, YY 0 )|Y^ 0 ] 
(either in probability or distribution) as n tends to infinity, which, however, has eluded our 
multiple attempts. Note that for the above-mentioned convergence, the martingale conver¬ 
gence theorem may not be applied, since it is not clear if the cr-algebra generated by 0 
gets larger at n increases. Also, in our attempts to prove (|22l) and (]25]h similar hurdles were 
encountered and parallel arguments as in Step 3.2 have to be used instead. Here, we remark 
that, in general, the convergence of a sequence of conditional expectations can be rather 
subtle and challenging; see some positive results in [13] and [6| where some fairly strong 
assumptions are imposed. 

5 Possible Future Directions 

The significant impact of the original I-MMSE relation ([2]) on non-feedback/memory less 
channels presages many possible applications of the extended I-MMSE relations (ITU]) . (ITT]) . 
(ITU]) . (ITT]) to situations where the feedback/memory are present; moreover, we envision 
that our new approach can provide new perspectives to examine a number of aspects in 
information theory. In this section, we will discuss some promising future directions one 
can further pursue based on this work. In a nutshell, the possible further directions can be 
summarized as follows: 

1. further extend the I-MMSE relation to colored Gaussian feedback channels, general 
feedback channels, and its limiting version in terms of mutual information rate; 

2. explore the properties of the extended MMSE; 

3. explore the applications of the extended I-MMSE relation to Gaussian feedback chan¬ 
nels, multi-user Gaussian channels, Gaussian channels with input/output memory; 

4. explore the applications of our new approach to other information-theoretic quantities, 
higher order derivatives, entropy power inequalities, and so on. 

5.1 Further Extensions of the I-MMSE Relation 

Colored Gaussian feedback channels. The discrete-time I-MMSE relation ([2]) carries 
over verbatim to linear vector Gaussian channels [14], and its extensions to more general 
settings include derivatives with respect to arbitrary parameterizations [30], higher order 
derivatives [32], and so on. Extensions of the continuous-time I-MMSE relation (fT3l) have 

■'Tor sticklers demanding mathematical rigor and perfection: It is known that there are multiple “missing 
steps” in the proof of Theorem 14.II in [14] : For instance, the differentiability of I{Xq ; Y 0 t ) with respect to 
snr does not seem to be trivial and thereby demands careful justifications, which are however absent in [14]; 
also, from (259) to (270) in the proof of Lemma 5 (a key lemma for the proof of Theorem 14.ID . the authors 
assumed that for a sequence of random variable X n convergent to 0 almost surely, lim,,_>. 00 E[X n ] = 0, which 
is not true in general. 
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been studied as well; representative work include fractional Brownian motion noise [9] and 
an abstract Wiener space mi si. On the other hand, all the above-mentioned extensions 
have been confined to the scenarios where the feedback are absent. 

In view of our results on extensions of the I-MMSE relation, one of the possible future 
directions is to further extend the I-MMSE relation to colored Gaussian feedback channels 
in both discrete time and continuous time. 

While the proposed direction is well within reach in discrete time, the same problem 
appears to be far more challenging in continuos time due to the inherent intractability of 
continuous-time Gaussian processes. A natural goal in this direction is to find the broadest 
class of continuous-time Gaussian processes for which the extended I-MMSE relation holds. 
One special class of Gaussian processes that appear to be tractable are those featuring 
canonical representations [20] (in terms of the standard Brownian motions) without discrete 
spectrum terms (see (6.8.2) of |21j). and thereby Girsanov’s theorem |24| . a key technical 
ingredient used in our proofs of Theorems 14.21 and 14.31 can be carried over to such processes. 
Since fractional Brownian motions are a special class of separable Gaussian processes, one 
would arrive at results which include the ones in [9j as special cases. 

General feedback channels. The exploration of fundamental relationships between 
information and estimation measures has not been confined to Gaussian channels only. As a 
matter of fact, a considerable amount of work, largely inspired by the I-MMSE relation for 
Gaussian channels, have been devoted to investigating non-Gaussian channels for parallel 
relations. In this direction, representative work include additive channels pT5], arbitrary 
channels EH, Poisson channels P51 El HD], binomial and negative binomial channels [39l 
[40) . This thread of efforts have culminated in a recent paper [22], where a unified general 
formula relating information and estimation measures was derived for Levy channels, which 
encompass Gaussian channels and a number of other non-Gaussian channels as special cases. 

One of the possible directions is to further generalize the result in |22j to Levy channels 
with feedback/memory, in either discrete or continuous time. Alternatively, one can also 
consider deriving the extended I-MMSE relation for channel featuring noise with jumps 
(obviously, noise of this type naturally exists in a variety of real-life situations). For this 
direction, it might be wiser to first consider additive Levy processes (which are different 
from Levy channels in [22j in spite of the same name), which have been extensively studied 
in mathematical theory and practical applications. Note that such extension, if successful, 
would generalize the one in [I0], which only deals with pure jump processes. A key ingredient 
for success would be an “explicit” Girsanov-type theorem for Leyy processes. 

Limiting version. For most non-degenerate channels with feedback/memory, the ca¬ 
pacity is computed via maximizing the (directed) mutual information rate, rather than the 
mutual information. This fact necessitates the consideration of the limiting version of the 
extended I-MMSE relation in discrete time as n tends to infinity. 

There are hurdles for the journey along this direction: First of all, not all input processes 
will guarantee the limit of the mutual information rate is well-defined. Another issue is the 
differentiability/smoothness/analyticity of the mutual information rate, which may fail for 
certain channels mm- So, it makes senses to focus one’s attention on identifying channels 
with explicit and reasonable assumptions on the input process for the existence of the mutual 
information rate and its derivative. 

Probably a feasible first step is to examine Gaussian channels with Markovian input 
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processes: at least for discrete-time Gaussian channels with ARMA noise, the capacity will 
be achieved by Markovian input processes [23]. Moreover, for certain Gaussian channels with 
a finite input alphabet, the analyticity/smoothness/asymptotics of the mutual information 
rate has been established [T9] . 

5.2 Properties of the Extended MMSE 

Properties of the discrete-time MMSE associated with Gaussian non-feedback channels, such 
as monotonicity, continuity, smoothness, analyticity, concavity and asymptotics, have been 
extensively studied mm- These properties have been utilized in a wide range of applica¬ 
tions; in particular, the following two properties H7| of the MMSE are of great interest and 
of direct use in deriving the capacity regions of some multi-user Gaussian channels, such as 
Gaussian wiretap channels |3] and Gaussian broadcast channels [3j: 

• Gaussian inputs are the hardest to estimate, which means that any non-Gaussian input 
yields strictly smaller MMSE than a Gaussian input of the same variance; 

• The single-crossing property, which, roughly speaking, says that a Gaussian MMSE 
curve (with respect to the snr ) only intersects with a non-Gaussian MMSE curve at 
most once. 

Naturally one may consider exploring whether or to what extent these properties hold for 
the extended MMSE in both discrete and continuous time. It is clear that for the extended 
MMSE, whether these two properties will hold depends on the adopted encoding schemes, 
which points out a natural future direction: to explore for what encoding schemes these two 
properties hold for the extended MMSE. In this direction, one reasonable candidate would 
be Gaussian channels with linear feedback encoding schemes; see, e.g., (ME]. 

5.3 Applications to Gaussian Feedback Channels 

Despite extensive efforts spent on colored Gaussian feedback channels, the capacity of such 
channels has largely remained unknown, except for some special cases [23]. The extended 
I-MMSE relations may be helpful to deepen our understanding of colored Gaussian feedback 
channels: First, notice that an application of the Cauchy-Schwarz inequality yields that the 
correctional term of an extended MMSE can be upper bounded by the MMSE term, up to a 
multiplicative constant. Since the MMSE term “corresponds” to Gaussian channels without 
feedback, it is plausible to at least derive some bound jT2] (which may depend on the signal- 
to-noise ratio) between the ratio of the feedback capacity and non-feedback capacity. Second, 
written as the sum of an MMSE term and a correctional term, an extended MMSE can be of 
great help, in both discrete and continuous time, to describe the asymptotical behavior |8j 
of the feedback capacity for the regime when snr is small or large. 

While deriving the capacity of a general colored Gaussian feedback channel seems to be 
far-fetched, one may consider making use of the extended MMSE relations to derive the 
feedback capacity for some special colored Gaussian feedback channels. It is well known 
(see, e.g., Ihara [21]) that for colored Gaussian feedback channels, linear feedback schemes 
are sufficient to achieve the capacity. This fact can be a major boost of the chance of deriving 
the exact capacity using the extended I-MMSE: under a linear feedback encoding scheme, 
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the inputs and the outputs are de facto jointly Gaussian, which means both the MMSE and 
the correctional terms can be explicitly computed. Note that the above-mentioned idea is 
particularly promising for the case when the Gaussian noise is Markovian, which implies 
that the correctional term is a scaled version of the MMSE term, and further the desired 
property that the extended MMSE is maximizeable by a Gaussian distribution. 

5.4 Applications to Multi-User Gaussian Channels 

Discrete-time. The original 1-MMSE relation has been applied to discrete-time multi-user 
non-feedback Gaussian channels including Gaussian broadcast channels, wiretap channels 
and interference channels and so on. Naturally, one tempting direction is to explore the 
possible applications of the extended 1-MMSE relation to discrete-time multi-user Gaussian 
channels when the feedback is present. For this purpose, one of the imminent problems is to 
identify those multi-user Gaussian channels for which linear feedback coding schemes achieve 
the capacity regions. Alternatively, one can also look into whether a “multi-user” version 
of the extended 1-MMSE relation exists, which may involve conditional mutual information 
with multiple message sets. As might be expected, such a multi-user extended 1-MMSE 
relation can provide more insights between the interactions among the users. 

Continuous-time. Recently, the infinite bandwidth capacity regions of a continuous¬ 
time white Gaussian multiple access channel with/without feedback, a continuous-time white 
Gaussian interference channel without feedback and a continuous-time white Gaussian broad¬ 
cast channel without feedback have been derived in [25]. The continuous-time 1-MMSE 
relation has been applied to derive the capacity region of continuous-time white Gaussian 
broadcast channels. It is very natural to further extend the above-mentioned results and de¬ 
rive the capacity region for more general Gaussian multi-user channels with feedback, such 
formulas might be of great help for the derivation of the capacity region of continuous-time 
white Gaussian broadcast channels with feedback, or even more general continuous-time 
multi-user channels. 

5.5 Applications to Gaussian Memory Channels 

It is conceivable that the extended I-MMSE relations CD and (fT7|) may be helpful for us 
to further understand Gaussian memory channels, which are suitable for modeling some 
storage systems, such as flash memories jT] . To be more precise, we believe that such 
extended relations will be helpful in terms of estimating/computing the capacity (region) of 
(multi-user) Gaussian channels with input/output memory. 

5.6 Applications of Our New Approach 

Other than the extended I-MMSE relations, one may also consider whether/how the proposed 
new approach for deriving the extended I-MMSE relation can be applied elsewhere. Below 
is a list of several scenarios where it can be instrumental. 

Other information-theoretic quantities. Other than recovering and extending the 
original I-MMSE relation, the proposed approach in this paper may be further applied to 
study other information-theoretic quantities as well, which has been evidenced by the simple 
and direct proof (see Section 12.21) for the classical de Brunij’s identity [38] |5]. It is our 
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opinion that investigations on whether our approach can be applied elsewhere, particularly 
to the situations where the derivatives of certain information-theoretic quantity are needed, 
is highly likely to bear fruit. Here, we remark that the derivative of relative entropy has 
been examined for channels involving mismatched estimation without feedback; see [Til 43j. 

Higher order derivatives. The second order derivative of the mutual information 
and entropy power function have also been computed in pX 32], which, among many other 
applications, have played a key role in understanding the concavity of the mutual information 
and deriving entropy power inequalities for Gaussian channels |5l d [321133]. We expect that 
such results can be extended to Gaussian feedback channels. Rough computations suggest 
that the framework of our approach can also be applied to compute higher order derivatives 
explicitly. Other than understanding concavity, such explicit expressions can also help to 
characterize the asymptotic behavior of the mutual information and entropy power function 
associated with Gaussian feedback channels. In this direction, some Talyor-series-expansion- 
like formulae seem to be within reach, which, undoubtedly, will yield a hirer characterization 
of the behavior of the mutual information and entropy power function of Gaussian feedback 
channels. 

Entropy power inequalities. The ideas and techniques in the proof of the original 
I-MMSE relation has been used to give new and simpler proofs of a number of entropy power 
inequalities 02 associated with Gaussian non-feedback channels. It is certainly worthwhile 
to look into whether these inequalities can be extended to Gaussian feedback channels using 
our new approach. And, obviously, the same questions can be asked in the continuous-time 
setting, which, however, appears to be much more challenging. 

Acknowledgement. We would like to thank Dongning Guo, Young-Han Kim, Tsachy 
Weissman for insightful suggestions and comments, and for pointing out relevant references. 


Appendices 
A Key Lemmas 

The following two well-known lemmas are the main tools that will be used to justify the 
interchanges between a differentiation and an integration in this paper; for their proofs, 
see m Theorem A.5.1, Theorem A.5.2]. 

Lemma A.l. Let f(x,6) be a continuously differentiable function with respect to 6 and X 
be a random variable. Let £ > 0 and suppose that 

(i) u{9 ) = E [f(X, 6 )] < oo for all 9 E (9 0 — e, 9 0 + e), and 

(ii) v{9 ) = E [f^f(X,9)] is continuous at 9 = 9 0 , and 

<oo, 

then we have u'(9 0 ) = v(9 0 ), i.e., 

The following lemma is a direct consequence of the above one. 


= E 


8=0n 


I***) 


0=B n 
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Lemma A.2. Let f(x,9) be a continuously differentiable function with respect to 9 and X 

be a random variable. Let e > 0 and suppose that 

(i) u(9 ) = E[/(X, 9)] < oo for 9 G (9 0 — s, 9 0 + e), and 


(ii) E 


sup \-^f{X,9)\ 

0(z(Qq — £,6q~\~£) 


< OO, 


then we have u'(9 0 ) = v(9q), i.e.. 


d 




= E 


e=e 0 




e=e 0 


B Justifications for the interchanges in Section 12.1 


1. We first prove that for any po 

- 7 - [ fY\x(y\x)fx(x)dx 

dp Jr 


p=p 0 


= / -j-f Y \x{y\x)fx(x)dx 
Jr dp 


P=PO 


or equivalently, we prove that for any po £ R and for any x',z' G R, 


d f 

~T / fvixipx' + J\x)f x {x)dx 
dp Jr 


P=PO 


= I ^~fy\x{px' + z'\x)f x {x)dx 
Jr dp 


(27) 


P=P 0 


In what follows, fix x 1 , z' G R and £ > 0. Straightforward computations yield that for all 
P £ (do — £ , Po + e ) 


fy\x(px' + z'\x)f x (x)dx < 




fx{x)dx < 


V2F 


and moreover, 


d , ( , ,, v d 

-Q-fy\x\px + z\x) = — 


0 —{px'—px+z ') 2 /2 


dp 

- _p-{px’-px+z ') 2 /2 / / _ 


= —e ^ ' ~ ’ '“{px' — px + z')(x' — x), 


which, together with the assumption that i?[X 2 ] < oo, immediately implies that 

dx < oo. 


f 

d 

/ sup 

-Q^fy\x{pxJ + z'\x)f x {x) 

Jr p€(po-e,po+e) 


The interchange as in (1271) then immediately follows from an invocation of Lemma IA.21 
2. We next prove that for any p 0 G R, 


T E [l0g fy(Y)] 


= E 


P=P0 


log fy(Y) 

dp 


( 28 ) 


P=P 0 


Note that, by the assumption that E[X 2 ] < oo, we have 

E[Y 2 } = p 2 E[X 2 } + E[N 2 } < oo, 
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which immediately implies the finiteness of E[log fy(Y)] for all p € (p — e, p + e). As in the 
proof of Section 12.11 we have 


E 


log MY) 
dp 


= pE[(X - E[X|y]) 2 ] = p(E[X 2 ] - E[E 2 [X|y]]), 


^ log /y(y) at p = po , it suffices to prove that 


which means to prove the continuity of E 
of E[E 2 [X|y]] at p = p 0 . 

As a matter of fact, we will prove the aforementioned continuity at any p. We first show 
that 


E[x|y] = 


x 


0 -{Y-px) 2 


fr(Y) Jr V2n 

is continuous in p. To see this, note that for any p, we have 


/2 fx(x)dx 


x 


__ e -(Y- px r/2 fx ( x) < _A_/; v (. r) . 


a/27 t 

of which the right hand side is integrable. It then follows from the fact that /=e" A ~p x } 2 / 2 fx(x) 
is continuous at any p and the dominated convergence theorem that 


x 




e - {Y - px)2/2 f x {x) dx 


is continuous in p. A similar argument can be applied to show that friX) is also continuous 
in p, which immediately implies the continuity of E[X|y] in p. 

We are now ready to show that E[E 2 [X|y]] is continuous in p. To see this, note that it 
follows from E[X 2 ] < oo that {E[X 2 |y], p > 0} forms a family of uniformly integrable 
random variables. This, together with the fact that E 2 [X|y] < E[X 2 |y], implies that 
{E 2 [X|y], p > 0} also forms a collection of uniformly integrable random variables. The 
continuity of E[E 2 [X|y]] then follows from that of E[X|y] and the uniform integrability of 

{E 2 [x|y],p > o}. 

Moreover, it can be readily verified that 


E 


opo+e 


\- J po—s 


d 

dp 


log fy(Y) 


dp 


= E 

< E 

< E 


/■Po+e 

'po—e 
PPo+£ 

LJ po—£ • 
rpo+e 

L^po-£ ■ 


(y - px)(x - x)fx\y(x\Y)dx 


dp 


|(y - px)(X - x)| fx\y(x\Y)dxdp 


(\YX\ + \Yx\ + p\xX\ + p\x 2 \ ) f x \y (x\Y)dxdp 


= E[E[|yx|] + |y|E[|x||y] +p|x|E[|x||y] +pE[x 2 |y]] 

= 2E[|yX|] + pE[E 2 [|X||y]] + pE[X 2 ] 

< pE[X 2 ] + ^E[X 2 ] + ^E[X 2 ] + 2pE[X 2 ], 

which is finite due to the assumption that E[A^ 2 ] < oo and the fact that E[X 2 ] < oo. So, by 
Lemma fA.il we can switch the integration and differentiation as in (1281b and therefore 


d 

dp 


/(x ; y) = -e 


d 

dp 


log py(Y) 


= E[(X-£[X|y]/ 
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C Justifications for the interchanges in Section 12.2 


We need to verify that for any t 0 > 0, 


4 / fy\x(Y\x)fx{x)dx 

dp Jr 


t=t 0 


/» l 

= / -i-fY\x(Y\x)fx(x)dx 

Jr dp 


t=t 0 


or equivalently, we prove that for any t 0 > 0 and for any x',z' G M, 


4 [ fy\x{x' + Vtz'\x)fx{x)dx 
CLt . h$ 


t=to 


\x{x' + Vtz'\x)fx(x)dx 


t=to 


which follows from a parallel argument as in the proof of ([2?|) . We also need to verify that 
for any t 0 > 0, 

t=to 

which follows from a parallel argument as in the proof of f!28[) . 


|-E[log fy(Y)\ 


= E 


t=tn 


4 lo g fy(Y) 

dp 


D Justifications for the interchanges in the Proof of 


Theorem [37 


In this section, we fix £ > 0 and we sometimes write Y, 1-1 ) as (p for notational 

simplicity. 

1. We hrst prove that for any p 0 e M + , with probability 1, 


J- f /WK)/W)<K 

dp Jr ri 


P=P0 


E/(u”W)/W)*» 


(29) 


P=P 0 


It follows from straightforward computations that for all p G (po — £, Po + e) 

1 1 




(^) r 


Moreover, we have 


2=1 


/«)<K < 


d 


(y/2n)' 


K) = Y?- 1 )))/( 


It then follows from ((8]) that, with probability 1, 

d 


sup 

l n pe[po— e,po+e] 


dp 


/(>7K)/K) 


dwi < oo. 


The interchange as in (j2U|) then immediately follows from an invocation of Lemma IA.21 
2. We next prove that for any p 0 £ 


d 

dp 


E[log f(Y™)] 


= E 


P=P0 


d 

dp 


log f(Y" 


(30) 


P=P0 

























Note that, by (J8]), we have for all p e [p 0 — e, p 0 + e] and for all i, 

E\Yp = p 2 E[ sup g?(Wt, r/- 1 )] + E\Z 2 \ < 00 , 
pe[po-e,Po+d 

which implies that H(Yi ) is upper bounded. On the other hand, it follows from 

H(Yi) > H(Y i \Y^~ 1 1 Wi) = H(Zi) 

that H(Yi ) is lower bounded, and so we have obtained the finiteness of E[log f(Y™)]. As in 
the proof of Theorem I3.lt we have 


E 


d 

dp 


log f(Y" 


p^E[( ffi -E[ 9i |y”]) 2 ] + p 2 £ E 


%— 1 
n 


2—1 


(g* ~ E[^|b?]) -^gi 


A A 

p^(E[ ft 2 ] - EpE'lftV,"]]) + p 2 £(E[ 9i — 9( ] - EpEbiim-p,]). 
1=1 i=l V 


dp" 


So, to prove the continuity of E ^ log f(Y at p = po, it suffices to prove that of 

E[ gl(wi y;- 1 )}, E[E 2 [ 9i (wyir'liu*]], E \ gi (wlYr l )d-g,(wi, y- 1 )], E[E[ ft (wy Y;-')\Yn^gi(w;, y,- 1 )] 

at p — p 0 . With Condition (jHJ) and the fact that for all feasible i , gi(Wl, Yf 1-1 ) is continuous 
in p, the continuity of E [gf{W[,Yl~ l )\ immediately follows from the dominated convergence 
theorem. Similarly, it can be also verified that 


E 


sup 

pe[po-e,/oo+e] 


^(WiYt^d-g^WiYt 1 ) 


< oo, 


which implies the continuity of E [gi(Wl,Y^ l )-^gi(W{, Y{ 1 )]. Moreover, a similar argu- 


(jjfi \si —1 

1 > df. 

ment as in Section [B] can be used to establish the continuity of E[E 2 [g l ( W\ , Y{~ 1 ) | Y, n ]] and 
E[E [gi(W{, Y{~ 1 )|y, n ] -^gi(W[, Y^ -1 )] in P- We then obtain the continuity of E ^ log f(Y{ 
as desired. 


Moreover, we verify that 



‘ rpo+e 

d 



rpo~\~£ 
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E 
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O 
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0) 

^log/W) 

dp 

= E 

i 

o 

1 

O) 

Jr 


PqM , Yr L ) giiWl y;~ l ) - gM , rr 1 ) 


d 


+P^(gi(wi,Yt 1 ) 
dp 


gMiYI- 1 )) )f(w?\Y?)dw? 


dp 


< E 


rpo+e 


' Po~e 


L - 

Rn i=i 

+P^ ( ^W,rr 1) -p,K,yr 1)) 


(y, - p^K, y;- 1 ) 9i {wi y r 1 ) - gM , ir 1 ) 




< oo, 

where the finiteness then follows from flH]). So, by Lemma fA.il the integration and differen¬ 
tiation in (1301) can be interchanged. 
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E Justifications for the interchanges in the Proof of 


Theorem 4.2 


In this section, let e > 0 and we sometimes write g(s, W ( *, Y^) as g(s) for notational simplicity. 
1. We first prove that for any po e R+, 


d 

dp 


E[g 2 (s,W£,Y 0 s )}ds 


= 2 f E 

p=Po Jo 


9(s,WS,Y‘)j-g( s ,W^Y‘) 


ds 


P=PO 


It immediately follows from Condition (d) that for any p E [po — £, Po + e], 


E[g 2 (s,W^Y 0 s )}ds < oo, 


and moreover, 


E 


sup 2p(s, W(s), F 0 s )i?(s, W(s), y 0 s ) 
pe[po-£>PO+e] 


r T 


f T 

/ d \ 2 ’ 

/ E 

sup g 2 (s,W(s),Y 0 s ) 

ds+ / E 

sup ( ~j-g(s,W(s), Yq) j 

pe\po-£,po+e\ J 

Jo 

pe[po-e,po+s] 

Jo 


ds < oo. 


The desired interchange then immediately follows from Lemma [A.21 
2. We next prove that for any po E M+, we have, with probability 1, 


d f dfJri\ W r | w > ( dw ) 
«P 7 dps 


d dp Y \w , v t\ \ ( A \ 
-(y 0 |w)/ivv(drc) 


P=PO 


dp dps 


P=P o 


First of all, it follows from Theorem 7.1 of [24) that p Y ~ /ig, and 

dpy , f dp Y \w sT 


W) = 


dpB J dps 

is finite almost surely, which can be further written as 


dpY /-i 7 -y\ 


(F 0 |w)pvi/(dw) 


o 2 r T 

P / -2/ 


dp B To ) = / exp J 9 (s)g(s)ds + p J g(s)dB s — — J g 2 (s)ds j p w (dw), (31) 

where g(s, Wq, Yq s ) is written as g(s) for notational simplicity. Emphasizing the dependence 
on p, we write 


Kp) 


exp Ip 2 g(s)g(s)ds + p g{s)dB s 



and write 


a(p) 


d 

dp 


b{p)p w {dw). 
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We now establish the continuity of a(p) with respect to p. To see this, note that it follows 
from a routine estimation that 


lim E 

£—>-0 


a(p + e) — a(p) f d 


dp 2 Kp)Pw(dw) 


= 0, 


where we have used the boundedness of g(s). This further implies that for any p, we have, 
with probability 1, 

rp r d 2 


a (p) ~ a (0) = 



dy 


;b( r y)p w (dw)d'f. 


The continuity of a(p) then immediately follows (or, more precisely, a(p) has a continuous 
modification). Moreover, it is straightforward to verify that 


E 


d 

dp 


Kp) 


< oo. 


Finally, with all the technical conditions checked, the desired interchange then immediately 
follows from Lemma fA.il 

3. Finally, we will prove that for any po £ E + , 
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dp 
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= E 


P=P 0 


d dp Y f V T\ 

T P iog ^ (y « ) 


P=P o 


First of all, we will show that for all p G [po — £, po + e], E log (j^-(y o T ) 
this, hrst note that it follows from Theorem 7.1 of [24] that 

dpY (y t ) — _|_ 

dp B ° ~ E[e~fo xdY + 1 / 2 fo x2ds \Y Y ]' 

By Jensen’s inequality, we have 


is finite. To see 
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£ ./n 
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and, by the easy fact that log a: < x for any x > 0, 
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Next, as in the proof of Theorem 14.21 we have 
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Note that 
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and furthermore, 


Us)\Y 0 T ]]ds < 
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It then follows from Condition (d) that 


rpo+E 


E 
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d , dp Y 
-log—(E 0 ) 
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dp < oo. 


Finally, with all the technical conditions checked, the desired interchange follows from 
Lemma IA.1I 
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